A complete and efficient CUDA-sharing solution for HPC clusters

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A complete and efficient CUDA-sharing solution for HPC clusters

In this paper we detail the key features, architectural design, and implementation of rCUDA, an advanced framework to enable remote and transparent GPGPU acceleration in HPC clusters. rCUDA allows decoupling GPUs from nodes, forming pools of shared accelerators, which brings enhanced flexibility to cluster configurations. This opens the door to configurations with fewer accelerators than nodes,...

متن کامل

CUDA-For-Clusters: A System for Efficient Execution of CUDA Kernels on Multi-core Clusters

Rapid advancements in multi-core processor architectures along with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs to efficiently utilize all the resources in such a cluster is still a major challenge. Programmers have to manually deal with low-level details that should idea...

متن کامل

Efficient and Scalable Operating System Provisioning for HPC Clusters

Operating system provisioning is a common and critical task in cluster computing environments. The required low-level operations involved in provisioning can drastically decrease the performance of a given solution, and maintaining a reasonable provisioning time on clusters of 1000+ nodes is a significant challenge. We present Kadeploy3, a tool built to efficiently and reliably deploy a large n...

متن کامل

Productivity in HPC Clusters

This presentation discusses HPC productivity in terms of: (1) effective architectures, (2) parallel programming models, and (3) applications development tools. The demands placed on HPC by owners and users of systems ranging from public research laboratories to private scientific and engineering companies enrich the topic with many competing technologies and approaches. Rather than expecting to...

متن کامل

Efficient CUDA

Recent work in deep learning has created a voracious demand for more compute cycles, but with the slowdown of Moore’s law, CPUs cannot keep up with the demand. Thus, attention has turned to massively-parallel hardware accelerators, ranging from new processing units designed specifically for deep learning to the repurposing of existing technologies like Fieldprogrammable gate arrays (FPGAs) and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Parallel Computing

سال: 2014

ISSN: 0167-8191

DOI: 10.1016/j.parco.2014.09.011